November 20th, 2017

Overview

  1. Problem Overview
  2. What Data is Used?
  3. Regression Analysis
  4. Spatial Analysis

Problem Overview

Question 1

Does the proximity of opening a new station effect the ridership of other stations near the newly opened station?

  • Not able to turn this into a 'spatial data' problem, seems like a distance problem

Question 2

How does the proximity of NYC subway stations effect ridership?

  • Bike Share guide recommends placing bike stations near public transit. NACTO

Data Sources

Citibike Data

Source

Ridership Data

Observations: 9,884,307
Variables: 11
$ tripduration     <dbl> 997, 1904, 305, 250, 464, 1118, 394, 1449, 42...
$ starttime        <dttm> 2013-07-01 06:00:16, 2013-07-01 06:00:30, 20...
$ stoptime         <dttm> 2013-07-01 06:16:53, 2013-07-01 06:32:14, 20...
$ startstationid   <chr> "436", "294", "385", "271", "477", "488", "30...
$ startstationname <chr> "Hancock St & Bedford Ave", "Washington Squar...
$ endstationid     <chr> "467", "375", "440", "390", "522", "497", "32...
$ endstationname   <chr> "Dean St & 4 Ave", "Mercer St & Bleecker St",...
$ bikeid           <dbl> 16199, 20281, 18143, 16370, 15497, 15502, 161...
$ usertype         <chr> "Subscriber", "Subscriber", "Subscriber", "Su...
$ birthyear        <dbl> 1979, 1949, 1988, 1962, 1975, 1957, 1963, 195...
$ gender           <chr> "2", "1", "1", "1", "1", "1", "2", "1", "1", ...

Citibike Data

Summary Statistics

Time: 6 - 10 am weekday mornings

Date Range:

[1] "2013-07-01 UTC" "2017-07-31 UTC"

Stations:

[1] 749

Citibike Data

Exploration

Citibike Data

Exploration (Cont.)

Citibike Data

  • Only the stations labeled in red will be used in the analysis.

  • These stations have the largest amount of longitudinal data and are more homogenous in the surrounding area.

  • Start analysis in 2014, giving 6 months "burn in"

Historic Weather Data

NOAA Data Request

Observations: 1,673
Variables: 6
$ day_trip <dttm> 2013-01-01, 2013-01-02, 2013-01-03, 2013-01-04, 2013...
$ PRCP     <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,...
$ SNOW     <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
$ SNWD     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ TMIN     <int> 26, 22, 24, 30, 32, 34, 37, 35, 39, 40, 37, 42, 43, 3...
$ TMAX     <int> 40, 33, 32, 37, 42, 46, 45, 48, 49, 47, 46, 47, 50, 5...

Subway Entrance Data

Regression Analysis

Model Definition

  • Fit a hierarchical Poisson model to control for variability throughout the year.
  • Outcome - Number of Rides

Coefficients:

  • Fixed: Temperature, Snow, Days from Jan 1, 2014, previous Days count
  • Random: Station, Week of Year

Model Performance

Specific Group of Stations

Spatial Analysis

Models

Does proximity to subway stations show decrease in Citibike usage?

Fitting the Variogram

No Regressors

Closest Subway Features

Next Steps

Further Analysis

Regression

  • Group a month before and a month after and compare the difference, model that as distance from new station.

Spatial Analysis

  • How can we tell if a predictor is 'significant'?